Linguistic scope-based and biological event-based speculation and negation annotations in the BioScope and Genia Event corpora
نویسندگان
چکیده
BACKGROUND The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principles of the two largest corpora containing annotation for negation and speculation - BioScope and Genia Event - are compared. BioScope marks linguistic cues and their scopes for negation and hedging while in Genia biological events are marked for uncertainty and/or negation. RESULTS Differences among the annotations of the two corpora are thematically categorized and the frequency of each category is estimated. We found that the largest amount of differences is due to the issue that scopes - which cover text spans - deal with the key events and each argument (including events within events) of these events is under the scope as well. In contrast, Genia deals with the modality of events within events independently. CONCLUSIONS The analysis of multiple layers of annotation (linguistic scopes and biological events) showed that the detection of negation/hedge keywords and their scopes can contribute to determining the modality of key events (denoted by the main predicate). On the other hand, for the detection of the negation and speculation status of events within events, additional syntax-based rules investigating the dependency path between the modality cue and the event cue have to be employed.
منابع مشابه
Linguistic scope-based and biological event-based speculation and negation annotations in the Genia Event and BioScope corpora
Background: The treatment of negation and hedging in natural language processing has received much interest recently, especially in the biomedical domain. However, open access corpora annotated for negation and/or speculation are hardly available for training and testing applications, and even if they are, they sometimes follow different design principles. In this paper, the annotation principl...
متن کاملAutomatic Extraction of Lexico-Syntactic Patterns for Detection of Negation and Speculation Scopes
Detecting the linguistic scope of negated and speculated information in text is an important Information Extraction task. This paper presents ScopeFinder, a linguistically motivated rule-based system for the detection of negation and speculation scopes. The system rule set consists of lexico-syntactic patterns automatically extracted from a corpus annotated with negation/speculation cues and th...
متن کاملThe BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts
This article reports on a corpus annotation project that has produced a freely available resource for research on handling negation and uncertainty in biomedical texts (we call this corpus the BioScope corpus). The corpus consists of three parts, namely medical free texts, biological full papers and biological scientific abstracts. The dataset contains annotations at the token level for negativ...
متن کاملSpeculation and Negation Scope Detection via Convolutional Neural Networks
Speculation and negation are important information to identify text factuality. In this paper, we propose a Convolutional Neural Network (CNN)-based model with probabilistic weighted average pooling to address speculation and negation scope detection. In particular, our CNN-based model extracts those meaningful features from various syntactic paths between the cues and the candidate tokens in b...
متن کاملBridging the Gap Between Scope-based and Event-based Negation/Speculation Annotations: A Bridge Not Too Far
We study two approaches to the marking of extra-propositional aspects of statements in text: the task-independent cue-and-scope representation considered in the CoNLL-2010 Shared Task, and the tagged-event representation applied in several recent event extraction tasks. Building on shared task resources and the analyses from state-of-the-art systems representing the two broad lines of research,...
متن کامل